97 research outputs found

    BIGhybrid: A Simulator for MapReduce Applications in Hybrid Distributed Infrastructures Validated with the Grid5000 Experimental Platform

    Get PDF
    International audienceSUMMARY Cloud computing has increasingly been used as a platform for running large business and data processing applications. Conversely, Desktop Grids have been successfully employed in a wide range of projects, because they are able to take advantage of a large number of resources provided free of charge by volunteers. A hybrid infrastructure created from the combination of Cloud and Desktop Grids infrastructures can provide a low-cost and scalable solution for Big Data analysis. Although frameworks like MapReduce have been designed to exploit commodity hardware, their ability to take advantage of a hybrid infrastructure poses significant challenges due to their large resource heterogeneity and high churn rate. In this paper is proposed BIGhybrid, a simulator for two existing classes of MapReduce runtime environments: BitDew-MapReduce designed for Desktop Grids and BlobSeer-Hadoop designed for Cloud computing, where the goal is to carry out accurate simulations of MapReduce executions in a hybrid infrastructure composed of Cloud computing and Desktop Grid resources. This work describes the principles of the simulator and describes the validation of BigHybrid with the Grid5000 experimental platform. Owing to BigHybrid, developers can investigate and evaluate new algorithms to enable MapReduce to be executed in hybrid infrastructures. This includes topics such as resource allocation and data splitting. Concurrency and Computation: Practice and Experienc

    BIGhybrid - A Toolkit for Simulating MapReduce on Hybrid Infrastructures

    Get PDF
    Cloud computing has increasingly been used as a platform for running large business and data processing applications. Although clouds have become highly popular, when it comes to data processing, the cost of usage is not negligible. Conversely, Desktop Grids, have been used by a plethora of projects, taking advantage of the high number of resources provided for free by volunteers. Merging cloud computing and desktop grids into hybrid infrastructure can provide a feasible low-cost solution for big data analysis. Although frameworks like MapReduce have been conceived to exploit commodity hardware, their use on hybrid infrastructure poses some challenges due to large resource heterogeneity and high churn rate. This study introduces BIGhybrid a toolkit to simulate MapReduce on hybrid environments. The main goal is to provide a framework for developers and system designers to address the issues of hybrid MapReduce. In this paper, we describe the framework which simulates the assembly of two existing middleware: BitDew- MapReduce for Desktop Grids and Hadoop-BlobSeer for Cloud Computing. Experimental results included in this work demonstrate the feasibility of our approach

    Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

    Get PDF
    A significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.N/

    Para um país de leitores: uma análise do Plano Nacional do Livro e da Leitura (PNLL)

    Get PDF
    Este texto é o resultado de uma análise exploratória das políticas públicas referentes ao “incentivo à leitura” no Brasil. Consideramos o Plano Nacional do Livro e da Leitura (PNLL), que, vigente desde 2006, orienta e dá forma às atuais políticas estaduais e municipais de incentivo à leitura no País, tomando como objeto de análise e discussão suas concepções norteadoras e seu formato. Para tanto, utilizamos informações oriundas das três esferas de governo com relação ao PNLL e a outros programas e informações estatísticas de diferentes fontes. O texto apresenta uma análise das informações estatísticas disponíveis sobre a leitura enquanto prática “cultural”; a seguir, é apresentada uma caracterização do PNLL e de suas implicações nos planos, programas e ações de incentivo à leitura. Essa análise enfatiza como seu formato está ligado aos impasses governamentais quanto às formas de financiamento das políticas de incentivo à leitura. Por fim, nas Considerações finais, retomamos as principais questões abordadas, apontando as limitações do Plano. Palavras-chave: leitura; políticas públicas; Plano Nacional do Livro e da Leitura (PNLL) TÍTULO EM INGLÊS Making a country of readers: an analysis of public policies on reading in Brazil Abstract This article is the result of an exploratory analysis of public policies on reading in Brazil. We take into account the National Book and Reading Plan (PNLL), which, since 2006, has guided and shaped the current state and municipal policies that encourage reading in Brazil. The Plan’s format and guiding conceptions are our object of analysis. To reach our goal, we used information from the three different spheres of government regarding the PNLL and other statistical programs and information from different sources. The text brings an analysis of the statistical information available on reading as a “cultural” practice; we then present a characterization of the PNLL and its implications for the plans, programs and actions that encourage reading. This analysis emphasizes how the PNLL’s format is connected with the government impasses on the forms of funding for the policies that foster reading habits. Finally, in the Final Considerations, we return to the main issues addressed, pointing to the limitations of the Plan. Keywords: reading; public policies; National Book and Reading Plan (PNLL) Artigo recebido em 20 out. 2015. ------------------------------ Classificação JEL: Z1

    A singularidade das práticas culturais: entrevista com Bernard Lahire

    Get PDF

    Efeitos da adubação orgânica e da época de colheita na qualidade da matéria-prima e nos rendimentos agrícola e de açúcar mascavo artesanal de duas cultivares de cana-de-açúcar (cana-planta).

    Get PDF
    Conduziu-se este trabalho com o objetivo de estudar os efeitos de três sistemas de adubação (30 t. ha-1 de esterco de curral, 3,5 t.ha-1 de esterco de galinha e adubação química - 120 kg.ha-1 de P2O5 e de K2O no plantio + 60 kg.ha-1 de N em cobertura) e três épocas de colheita da cana (julho, agosto e setembro de 2003), na qualidade da matéria-prima e nos rendimentos de colmos e de açúcar mascavo de duas cultivares de cana-de-açúcar (SP79-1011 e RB72454). O experimento foi instalado em área do Alambique JM, Perdões, MG. O delineamento experimental foi o de blocos casualizados, em esquema fatorial (2 x 3 x 3), com três repetições. Não houve efeito dos fertilizantes nos rendimentos de colmos e de açúcar mascavo das cultivares estudadas. Verificou-se efeito de épocas de colheita no rendimento de colmos, com destaque para os meses de agosto e setembro. No entanto, para rendimento de açúcar mascavo nenhuma diferença foi observada. Assim, nas condições deste trabalho, é viável a substituição da adubação química pela orgânica (esterco de curral ou de galinha), sem perdas na qualidade da matéria-prima e nos rendimentos de colmos e de açúcar mascavo artesanal, sendo que os meses de agosto e setembro foram os que proporcionaram matéria-prima de melhor qualidade e maiores rendimentos de colmos

    D 3 -MapReduce: Towards MapReduce for Distributed and Dynamic Data Sets

    Get PDF
    International audienceSince its introduction in 2004 by Google, MapRe-duce has become the programming model of choice for processing large data sets. Although MapReduce was originally developed for use by web enterprises in large data-centers, this technique has gained a lot of attention from the scientific community for its applicability in large parallel data analysis (including geographic, high energy physics, genomics, etc.). So far MapReduce has been mostly designed for batch processing of bulk data. The ambition of D 3-MapReduce is to extend the MapReduce programming model and propose efficient implementation of this model to: i) cope with distributed data sets, i.e. that span over multiple distributed infrastructures or stored on network of loosely connected devices; ii) cope with dynamic data sets, i.e. which dynamically change over time or can be either incomplete or partially available. In this paper, we draw the path towards this ambitious goal. Our approach leverages Data Life Cycle as a key concept to provide MapReduce for distributed and dynamic data sets on heterogeneous and distributed infrastructures. We first report on our attempts at implementing the MapReduce programming model for Hybrid Distributed Computing Infrastructures (Hybrid DCIs). We present the architecture of the prototype based on BitDew, a middleware for large scale data management, and Active Data, a programming model for data life cycle management. Second, we outline the challenges in term of methodology and present our approaches based on simulation and emulation on the Grid'5000 experimental testbed. We conduct performance evaluations and compare our prototype with Hadoop, the industry reference MapReduce implementation. We present our work in progress on dynamic data sets that has lead us to implement an incremental MapReduce framework. Finally, we discuss our achievements and outline the challenges that remain to be addressed before obtaining a complete D 3-MapReduce environment

    SMART: An Application Framework for Real Time Big Data Analysis on Heterogeneous Cloud Environments

    Get PDF
    International audienceThe amount of data that human activities generate poses a challenge to current computer systems. Big data processing techniques are evolving to address this challenge, with analysis increasingly being performed using cloud-based systems. Emerging services, however, require additional enhancements in order to ensure their applicability to highly dynamic and heterogeneous environments and facilitate their use by Small & Medium-sized Enterprises (SMEs). Observing this landscape in emerging computing system development, this work presents Small & Medium-sized Enterprise Data Analytic in Real Time (SMART) for addressing some of the issues in providing compute service solutions for SMEs. SMART offers a framework for efficient development of Big Data analysis services suitable to small and medium-sized organizations, considering very heterogeneous data sources, from wireless sensor networks to data warehouses, focusing on service composability for a number of domains. This paper presents the basis of this proposal and preliminary results on exploring application deployment on hybrid infrastructure

    Boosting big data streaming applications in clouds with burstFlow

    Get PDF
    The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%
    corecore